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This work aims to introduce a novel approach for auxiliary task guidance (ATG). In 
this approach, our goal is to achieve effective guidance from a suitable auxiliary task 
by utilizing the uncertainty in calculated gradients for a mini-batch of samples. Our 
method calculates a probabilistic fitness factor of the auxiliary task gradient for each of 
the shared weights to guide the main task at every training step of mini-batch gradient 
descent. We have shown that this proposed factor incorporates task specific confidence 
of learning to manipulate ATG in an effective manner. For studying the potency 
of the method, monocular visual odometry (VO) has been chosen as an application. 
Substantial experiments have been done on the KITTI VO dataset for solving monocular 
VO with a simple convolutional neural network (CNN) architecture. Corresponding 
results show that our ATG method significantly boosts the performance of supervised 
learning for VO. It also out performs state-of-the-art (SOTA) auxiliary guided methods 
we applied for VO. The proposed method is able to achieve decent scores (in some 
cases competitive)compared to existing SOTA supervised monocular VO algorithms, 


while keeping an exceptionally low parameter space in supervised regime. 
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1. INTRODUCTION 

Recent research shows that various forms of multi-task learning (MTL) are being used to boost the 
performance of neural networks (NN) beyond their capacity [1]. The goal of MTL is to maximize the performance 
of several tasks by learning multiple tasks together while auxiliary task learning is directly concerned with 
better performance of the primary task [2]. In this work, we present an effective approach to gain guidance 
from auxiliary tasks. The novelty of the proposed approach is that it can effectively control the contribution 
of auxiliary task gradients for each shared weight by measuring its suitability for fitting into the main task’s 
loss gradient distribution. For investigating the effectiveness of the proposed method, we have chosen the 
complex problem of monocular visual odometry (VO) pose estimation. Geometric approaches of VO [3]—[6] 
work very well for known environments, but require consistency in camera calibrations [7]. However, learning 
based approaches show superiority in robustness to inconsistent environment|8]. Complex architectures of 
deep learning (DL) solutions capture the high complexity of the VO problem. However, DL based approaches 
possess limitations like higher inference time, larger memory requirements, and overfitting tendencies. Simpler 
architectures may create balance between these challenges. But merely using a simple architecture is not good 
enough for solving complex VO problems [9]. Performance boosting techniques like MTL, auxiliary task 
learning (ATL) can be a solution here. Costante and Ciarfuglia [10], Yang et al. are examples where MTL 
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approaches have been embraced in pose estimation. Using the proposed ATG method, we solve them monocular 
VO pose estimation successfully problem with relatively simple architecture. 

Developing better guidance methods for MTL and ATL is a primary research question. Chen et al. 
performs normalization of gradients to balance learning between multiple tasks. Yu et al. manipulates 
directions of the gradients to provide better guidance. Du ef al. quantifies similarity between this by 
measuring the cosine similarity between gradient vectors of two tasks and therefore tuning for a suitable 
threshold for similarity value. Our proposed approach measures similarity in a more precise manner by 
considering each shared weight gradient separately. Unlike existing approaches, we weigh the similarity with 
task specific confidence of learning as well. For the chosen field of application of VO, traditional geometric 
methods produced state-of-the-art solutions for pose estimation including [6], [5], but they are prone to motion 
drift. Supervised learning-based methods solve this challenge because they are more robust to unstable 
environments [15]. However, they possess an additional challenge of requiring complex architectures or having 
a huge number of hyper-parameters [16]-[21]. Due to these conflicts, ultimately most recent works are focusing 
on unsupervised learning and being successful with much higher margin [11], [22], [23]. Among MTL based 
supervised approaches for VO problem, latent space VO (LS-VO) learns a low dimensional optical flow 
(OF) subspace with pose estimation jointly but still works in a huge parameter space. Our proposed ATG 
approach helps enable a supervised learning method to perform well even in a much lower parameter space than 
existing methods for the complex problem of VO. At a glance, the contributions of this paper are: i) Proposing a 
new approach for providing effective and better guidance from auxiliary task. 11) Demonstrating the effectiveness 
of the proposed method by solving the monocular VO problem using a tiny network compared to existing 
supervised models with OF subspace learning as auxiliary task. 


2. METHODOLOGY 
2.1. NN architecture specification 

The complete architecture, illustrated in Figure [I]can be divided into two major sections, encoder 
section and task specific section. The encoder section is a modified FlowNet architecture [24], reducing its depth 
by half for every layer. This section is the feature extractor of the framework and is shared by both tasks. The 
task specific section, consisting of three parts, is dedicated for translation estimation, rotation estimation and 
flow image prediction. Rotation and translation estimation parts of the network are based on separate sequences 
of dense fully connected layers and a decoder network estimates OF. 
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Figure 1. Visual representation of the architecture and algorithm 


We scaled down the input images from (1241, 376, 3) to (320, 128, 3) which reduces parameters and 
helps to overcome overfitting without compromising result accuracy. Images were normalized with mean 
centered to 0 ranging from -0.5 to +0.5. The network takes two consecutive images from each particular 
sequence of KITTI VO dataset and stacks them depth-wise as input. In Figure[I| output Xx is generated after 
the image passes through the 9 shared layers of the encoder. Each shared layer block consists of 2D-convolution 
layer, batch normalization and leaky rectified linear unit (leaky ReLU). The flow image prediction section uses 
output Xp directly as the input. Flattened Xx is used as the input of translation estimation and rotation estimation 
sections, both consisting of dense layers. Final outputs of the entire framework are 6 degrees of freedom (DOF) 
pose estimation and flow image predictions. 


Boosting auxiliary task guidance: a probabilistic approach (Irfan Mohammad Al Hasib) 


98 0 ISSN: 2252-8938 


2.2. Probabilistic auxiliary task guidance 

As shown in Figure[I] the network learns three tasks with three different loss functions-translation 
loss(Ltrans), rotation loss(L,-o¢) (as main task loss), OF subspace learning loss(L ¢jo,,) (as auxiliary task loss). 
In ATG learning, the total loss is usually defined as: 


Legal? <Onmaia Ou) = Bal mata Onaia) oF Bal el Parc) 
weight 0 can be updated as, Onew = 0 + a( Bmw ae dLmain ae Gs dLous ) 


Here, a= learning rate, 3,,= main task loss coefficient, G,= auxiliary task loss coefficient, 9= weights of shared 
layer, @main= weights of task specific layers for main task, dgu2= weights of task specific layers for auxiliary 
task, N= mini-batch size. One of the key research questions in ATL is to choose an optimum coefficient 
(Gq) to encourage positive transfer and blocking negative transfer from an appropriate auxiliary task [1], [14]. 
Our approach finds a solution to this question by tuning (3, initially and then optimizing it extensively with a 
probabilistic factor calculated for each shared weight that prioritizes assistance from the auxiliary task with 
respect to its guiding capability. In this section, we present the approach of calculating this factor and discuss 
how it allows us to integrate both task specific confidence of learning and task similarity in the guidance process. 

From central limit theorem, we can say that the praaients of a mini-batch belong to a certain normal 
distribution. Let the mean of the gradients for main task, + 7 pe ae be {im and the mean of the gradi- 
ents for auxiliary task be ,. The distributions of gradients for the anh and auxiliary task are denoted as 
M (tm, o2,) and A( fq, 07) respectively. Calculated probabilities of jz, in both distributions have been expressed 
as P({g|Um,o2,) and P({1q| Ma, 07). This calculated P(1q|{4m,02,) indicates what probability would jz, have 
if it belonged to the distribution of the main task /. In other words, it signifies how much py, fits the current 
distribution MM as a random numeric value. Hence, it could be a reasonable parameter to decide how much 
auxiliary task loss should contribute to the total loss. It can be incorporated by using it as a multiplication 
factor with auxiliary tae’ loss coefficient. But empirical values of gradients and their variances reveal that the 
probability P(a| Lm, o2,) a vary in between a very wide range(101 ~ 10%), shown in Figure(2| 2| Hence, 
using probability P(j1q|{4m,07,) merely as the multiplication factor makes the gradients unstable by changing 
them drastically. Two types of scaling are done to handle this issue. In the first method, we divide these 
probabilities by their maximum value in respective layer’s weights. Thus it is restricted between (0,1). This 
method is named probabilistic factor (PF) method. 


Aé = Dralion ap LP Gial tesco =) Peel ata (1) 


Where, Prax = Maximum value of probability in respective layers. However, probability values with small 
magnitude are at risk of getting vanished especially when variance values are high. Taking log of probability 
does not help much in this issue. So,in the second method, we propose to scale the probability P(jig|{m, 77,) 
by dividing it with P({1q|4a,07) which is the probability of the same variable ju, in its own distribution. We 
denote this method as the probability ratio factor (PRF) (1) method. It has performed best in our experiments for 
reasons we will explain gradually in later sections. We have defined the ratio of two probabilities as relative 
probability factor, o(m, a) and updated (1) as (3). 


_Ealtinn Os.) 
lies io 
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So, Onew = =6+ a Bm 5 N ae esa, ate p(m, a)Ba x N ae “e) 


The value of p(m, a) is constrained within a suitable range (shown in Figure [2{a), [2{b), and [2{c)). So the ratio 
does not change drastically for consecutive training steps, the learning process becomes more stable. For the 
purpose of analysis ,we have introduced two novel terms: task confidence, ¢ and task similarity, 7 as: 


P( Halt; Fm) = sae exP(— 9 (AS*)?); Cm) = Ves 7(m, a) = exp(—3 (A= )”) 
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1 
mini-batch, lower variance of the gradients will indicate stable learning. Intuitively, we interpret it as a measure 


of confidence of this distribution /.The term (“*—"™) is the measure of distance between jig and 1; for unit 
Om. SO, mathematically, 7(m, a) will measure similarity(not distance) very strictly if the distribution has low 
variance and vice versa. For auxiliary task, T(a, a) = exp(—5(4*=#*)?) = 1. So, from (2), probability ratio, 


The term — is inversely proportional to standard deviation (¢,,,) of the distribution. For a particular 


prac) = Malate.) _ Gon r ita) _. Sim) 
PUjig| [tase ?) ¢(a)r(a, a) C(a) 


Here, we have defined ¢(m) /¢(a) as the relative task confidence of the main task compared to the auxiliary task 
which helps main task in different scenarios, summarized in Table 1. If the main task is learning with relatively 
higher confidence and auxiliary task gradient still has a good similarity (despite the high confidence value of the 
main task) that is the most desired scenario for ATG. Thus our approach ensures efficient and effective guidance 
from the auxiliary task by avoiding negative transfer in critical scenarios. 
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Figure 2. Probability ratio, raw probability and normalized probabilities for main task and auxiliary task. Data is 
collected for one random weight of layer 1 for 1,000 consecutive training steps. Left plot is for 15” epoch and 
right plot is for 30°” epoch among 30 epochs. a) probability ratio b) raw probabilities and c) normalized 
probabilities. 


Table 1. Possible scenarios of the PRF method 
Scenario ¢(m)/C(a) T(m,a) — Auxiliary task guidance 


Case 1 high low moderate guidance 
Case 2 high high helps positive guidance 
Case 3 low low blocks negative guidance 
Case 4 low high moderate guidance 


From the above definition of the given terms, we can write, P({a|{m,o2,) = ¢(m)t(m, a). Now, 
probabilities vary within a wider range around (10! ~ 10*) [2{b)]. Since task similarity, T(m, a) lies between 
(0 ~ 1), so the value of ¢ effectively controls the range of probability values. Consequently correlation between 
¢(m) and ¢(a) shown in Figure [3] causes correlation between the probabilities demonstrated in Figure [2{c). 
That’s why the scaling effectively keeps the probability ratio o(m, a) within a suitable range [Figure [2{a)]. The 
relative probability factor reduces the effect of common sources of variance among the distributions. Thus it 
emphasizes on the task specific variance that gives information only about relative performance of the tasks. 
It will be explained in detail in[2.4]] Also, auxiliary task gradients are more stable compared to the main task, 
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(discussed in (2.3). So, the relative probability (o(m,a)) gives us a better estimate of relative performance 
of the main task compared to auxiliary task. Note that the parameter {, is a constant coefficient (tuned as 
a hyperparameter) for all the weights but p(m,a) is calculated separately for each of the weights at every 
mini-batch gradients update. 


SOP TTT RE a PCTE Ho Bes] ge 
So0d CMT TSF AEL TL USTIT|# 1! THIEN) dts Epochto 08 
ct I \| | ly} | | || | # oe 
Fi Ne VAN AVE PL ARA ig Eecht5 i] 
Zoe; |} | |) ||| HW Ve Wl | | + Epoch_20 8 0.6 

i+ i \ | 7) 
2 4 has WU) Uy lk | ‘hale |p Epoch. 25 E 
Egat MA eat hl RIP al ne tiie Epoch_30 O 0.4 
(eo) MT He | | * hl | | 
So) Alken aa ss ee aii ‘| ?, I -_ ft. 
® 0.2 j (nite, A of 0 p © 92 za 

\j%% © ou 

§ |. ¢ Z 

T T T T T 0.0 

0 20 40 60 80 100 Epoch_5 Epoch_10 Epoch_15 Epoch_20 Epoch_25 Epoch_30 

i weight Training epochs 


Figure 3. Correlation of ¢(a) and ¢(m) for i=100 different weights (0;) (X-axis). Here, ¢(m) and ¢(a) is 
collected from 1,000 consecutive steps of a single epoch and corr(¢(m), ¢(a)) for each of the i*” weight is 
measured from this 1,000 confidence pairs. To comprehend the information, right side plots are numerical 
average for every epochs of the left plots 


2.3. Optical flow subspace learning as auxiliary task 

Estimating a lower dimensional representation of the dense OF field from one pair of raw red green 
blue (RGB) images is the auxiliary task learned by our network. The chosen auxiliary task is easier to learn 
than the main task. This is because latent OF representation estimation loss is measured from the entire OF 
image where the odometry loss is measured from 6 sparse pose values. Also we are learning OF subspace by 
minimizing pixel-wise root mean squared log error (RMSLE) (while precise raw OF learning requires using root 
mean squared error (RMSE) loss). This makes the learning task even easier [10]. Finally, the capability of OF 
task for helping pose estimation can be proved from vanilla MTL results from experiment with ATG method in 
Table[2] 


Table 2. Comparative results of different approaches 


NN ATG PF PREF 
Sequences 
rel rel trel Trel trel rel trel Trel 
04 13.5811 2.0449 12.0222 2.2133 11.7648 0.9098 15.3180 1.1278 
05 10.9609 2.1660 11.9858 2.1768 9.4700 1.8930 6.9010 1.4982 
07 13.1821 4.5040 11.6105 4.3484 8.0245 3.7180 6.9114 2.5165 
10 21.9880 8.0013 19.3150 3.9233 13.5378 4.2081 11.6025 3.4710 


tre: Mean translational RMSE drift (%) on length of 100m-800m. 
Trel: Mean rotational RMSE drift (deg/100m) on length of 100m-800m. 


2.4. Sources of variance in gradients 


In this section, we have analyzed the sources of variances in the task specific loss gradients calculated 
with respect to the shared weights. The loss gradients can be expressed in (5) using chain rule of differentiation: 


di, dz dA dl, 

Ci= ag a de aA ” 
Where, z = 0x + b, A= f(z), x = input coming from previous layer, /, = function of loss for task t, 9 = 
shared weight; b = bias; f:activation function. Since the shared layers are convolutional layers the above 
equations element should be corresponding matrices like Z = 0 x + b. But we have considered scaler variables 
i simplicity. From (5), the gradient equations of both tasks will be the same except 4 “1 for main task and 
ae for auxiliary task. So we can write dz/d0 x dA/dz = F(x) since z and A both are = fonction: of x and 
dl,/dA = H(l,). Consequently, the common term F'(z) is responsible for the correlation in the variances of 

these gradients as shown in Figure[3] Generalizing the equation for gradient calculation: 
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G, = —! = F(x) H(t) (6) 


We propose that incorporating relative task confidence ¢(m)/C¢(a), allows us to diminish the effect of 
the part of variance coming from the common source. So the only functional part of the variance is coming from 
task specific losses. This claim can be proved by following example: 

Let X = {21,%2,...,tD}; Im = {IP IY, ..., U5} 5 La = {If 15, ..., 15} (D is the size of dataset) be a set of 
inputs and corresponding main task and auxiliary task losses. Let’s consider a mini batch of n samples, X, 
= {x,;|¢ € [1,n],7 © N} where X, C X; LP = {I |i € [1, n],¢ ¢ N} where 7 C L™; Le = {l2|t € [1, n]} 
where L? C La. Let, n = 3. For ease of expressing relations, elements of set /'(X;,) are denoted by a, b, c 
respectively. Similarly, H(L;”) = {u, v, w} and H(L?) = {p, q,r}. So, from (6). gradients can be expressed as: 


TY =au;G? = bv;G?’ = cw; Gt = ap; Gs = bq; G3 = cr 
C(m) _ EGP )-E(GE)? _ / IN (G2? +.G2? +49?) —(G2+G24G2)?] 
we (E(Gr)-B(Gr)? Viner +er?+er?)-(ar+erter)?| 


G(m) _ s/[N(a?p? + 82? + cr?) — (ap + bq + er)?] 
C(a) [LN (a?u? + b2v? + c2w2) — (au + bu + cw)?| 


(7) 


If the loss dependent variable sets (p, g, r in numerator and u, v, w in denominator) have very low 
variance compared to the input dependent variable set a, b, c; then the change in values of o, and a,, will be 
dominated by mostly a, 6, c. In (7), the denominator and the numerator both contain input dependent variables 
a, b, c which consequently causes correlation between ¢(m) and ¢(a) (observed in Figure|3p. The lower the 
variance of p, g, r and u, v, w will be (compared to the variance of a, 6, c), the higher correlation will be 
eventually. Let, oap- = standard deviation of the input dependent variables, o,,,,, = standard deviation of main 
task loss dependent variables, o,,,, = standard deviation of auxiliary task loss dependent variables. Let’s discuss 
the effect of 745. and Ow. in relative task confidence, ¢(m)/¢(a), considering og, remains almost constant. 
Confidence of main task ¢(m) decreases when o,,, increases. This can happen in 3 possible cases: Case /: 
only Ogpc increases- In this case the numerator and denominator both will increase resulting comparatively no 
notable change in relative task confidence factor,¢(m) /¢(a). So it diminishes the effect of high variance of main 
task loss gradient if the variance is being caused by input dependent sources (common source). Case 2: Oyyw 
increases- In this case only the denominator will increase, resulting in significantly lower ¢(m)/¢(a) value. So 
here ¢(m)/C(a) is considering the effect of high variance of main task loss gradient significantly only when 
the variance is being caused by task specific sources (uncommon source which is task loss dependent). Case 3: 
Both ogpe and 04, increases- In this case the numerator will increase but the denominator will increase more 
since it is function of both a, b, c and u, v, w, resulting moderately lower value of ¢(m)/¢(a) (not as low as 
case 2). So, ¢(m)/¢(a) seems to decrease the effect of high variance of main task loss gradient moderately 
because the variance is being caused by both task loss specific sources and input dependent sources. Above 3 
cases demonstrate, how relative task confidence is less affected by common source of variances. 


Algorithm 1 Algorithm (PRF) 


Init 0, main, Paux 
set a, Bm, Ba 
while epoch do 
for mini-batch in Dataset do 
Gre Lt ini G2 © Lhue Visi = [1, N] Ai EN 
Hm <— W ei GP Oi A= NW a (es — Um)? 
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Paux aa Paux a aBata 
end for 
end while 
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3. EXPERIMENTAL RESULTS 


KITTI VO dataset is used to learn pose estimation. This dataset contains 11 sequences; seq. 0-3, 
6, 8-9 have been used for training while 4, 5, 7 and 10 have been used for validation. However, KITTI VO 
dataset does not include OF images for these sequences. We have trained FlowNetS architecture separately 
using KITTI flow dataset and used the trained FlowNet to generate OF ground truth images for VO dataset. 
OF templates are trained using root mean squared logarithmic error (RMSLE) to learn the OF subspace 
effectively (Figure|4), where[4{a) and|4[b) is consecutive image pair, |4{c) ground truth, 4{d) predicted OF. For 
model training, We have used (320 x 128) images with batch size 16. After hyperparameter tuning, we have 
found the best fitted model with learning rate 0.0005, 5; = 1, 6, = 10 and 6, = 0.1, gradient clipping has been 
applied to prevent overfitting. TensorFlow and Keras DL framework have been used for all experiments with 
machine specifications: Intel Core 17-9750H CPU @2.60GHz (12 CPUs), 16 GB RAM and NVIDIA GeForce 


GTX 1660Ti GPU. 
(c) (d) 


Figure 4. Optical flow subspace estimation (a),(b) image pair (c), ground truth, and (d) prediction 


a ~ ell 


Figure[5|demonstrates (from left to right): 1) simple NN method (without ATG), 11) vanilla ATG method, 
111) PF method, iv) PRF method, v) Library for Visual Odometry 2 - Monocular (LIBVISO2-M) method, vi) 
Oriented FAST and rotated BRIEF-simultaneous localization and mapping (ORB-SLAM2) method. Here, v) [6] 
and vi) are popular geometric methods commonly used for solving VO problem.They are given to compare 
PF and PRF methods’ performance with respect to existing geometric methods. From 1) to iv), it is evident 
that both PF and PRF method boosts the performance of simple NN as well as vanilla ATG method with a 
good margin (Figure|5| Table[2). Thus the claim regarding parameter reduction with our method is justified. 
It also outperforms other state-of-the-art (SOTA) ATL methods i.e the cosine similarity-based approach [14], 
and projecting conflicting gradients (PCGrad) for MTL (Table[3). For fair comparison, we also modified 
PCGrad by only keeping the gradients directing the common normal for auxiliary task while keeping main task 
gradients unchanged. We referred this method as PCGrad ATL; which also cannot outperform ours. These 
results prove PRF method’s effectiveness and superiority as an ATG method. 


Table [3] demonstrates the effectiveness of our method for ATG and the comparison shows that our 
method outperforms the other SOTA ATL methods. Since the proposed method is applied to the complex 
problem of VO, comparison is also shown with some classic VO methods (Table [4) as well as SOTA VO 
methods (Table|5). The superiority of geometric and unsupervised learning-based approaches in the field of 
VO is undeniable. We acknowledge that our results are good but clearly do not beat the SOTA VO methods. 
However, the goal of this paper is not to outperform all the SOTA methods of VO, rather to show that using 
the proposed ATG method a complex problem like monocular VO can be solved with a remarkably smaller 
network while maintaining a relatively competitive results (Table|5) in supervised regime. Our inference network 
for pose estimation has 9,438,630 number of parameters which takes about 0.031 sec in average for each 
prediction. It has beaten most of the existing supervised methods in memory requiring at least 5-20 times less 
parameters. While most of the successful supervised methods highly exploits the temporal relation between 
frames by utilizing long sequences i.e 3-11 along with LSTM layers feeding high resolution images 1.e 1280x384 
(Table (5). we use dense layers for pose estimation and only one pair of images (320x128), which is the key 
reason of our faster inference. To our best knowledge, no supervised learning method can achieve this level of 
accuracy with such small parameter space. 


It is evident that our method falls behind to some extent in case of the translation error. This is because 
like other supervised methods, it tries to learn absolute scale automatically from training images but at such a 
low parameter space absolute scale is not being learn very well. Some geometric methods takes advantage of 
loop closure and 7-DOF alignment with ground truth for scale correction. Future research can be done utilizing 
additional auxiliary task like depth estimation for better learning of absolute scale. 
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Figure 5. Comparative results of different approaches for sequence 05. Horizontal and vertical axes represent 


corresponding distances in the map. 


Table 3. Comparison with existing methods of ATL methods 


Cosine PCGrad PCGrad PF PRF 

Seq. Similarity MTL ATL [Ours] [Ours] 
tel Trel trel Trel trel Trel trel Trel trel Trel 
04 16.49 1.64 17.46 4.70 18.54 0.84 11.76 0.91 15.32 PAS 
05 10.86 3.36 14.04 5.21 9.99 2.74 9.47 1.89 6.90 1.50 
07 6.35 3.41 16.19 10.81 7.95 2.54 8.02 3.72 6.91 2.02 
10 16.82 3.13 25:39 6.94 16.90 3.24 13.54 4.21 11.60 3.47 


Table 4. Comparison of our results with some classic(p) methods in the field of VO 


LIBVISO2-M ORB- DeepVO UnDeep SfmLearner PRF (S) 
Seq. (G) SLAM(G) (S)(17) VOU) (22) (U)i23) [Ours] 
Erel Trel Erel Trel trel Trel trel Trel trel Trel trel Trel 


04 4.69 4.49 1.41 0.14 7.19 697 5.49 2.13 4.49 S24 15.42. 1.13 


05 19.22 17.58 13.21 022 2.62 3.61 3.40 150 18.67 4.10 6.90 1.50 
07 23.61 19.11 10.96 0.37 3.91 460 3.15 2.48 21.33 665 6.91 2.2 
10 4156 32.99 3.71 0.30 8.11 883 10.63 4.65 4.49 14.33 11.60 3.47 


G: Geometric, S: Supervised, U: Unsupervised 


Table 5. Comparison with SOTA supervised visual odometry methods 


Arch DeepVO ESP-VO GFS-VO- GFS-VO Beyond PRF 
; RNN[20] tracking|21] [Ours] 
Result trel rel trel Trel trel Trel trel Trel trel Trel trel Trel 


Seq.04 7.19 6.97 6.33 6.08 393. 2.36 2.91 1:30 2.96 1.76 15.32 1.93 
Seq.05 2.62 3.61 3.35 4.93 5.85 2.55 3.27 162 2.59 1.25 6.90 1.50 
Seq.07 3.91 460 3.52 5.02 5.88 2.64 3.37 2.25 3.07 1.76 691 2.52 
Seqg.10 8.11 883 9.77 = 10.2 744 3.19 632 2.33 3.94 1.72 11.60 3.47 


Param. 463 M 463 M **80) M **AT M **A7 M 9M 
Res. 1280x384 1280x384 1280x384 1280x384 1280x384 320x128 
Sq.len Arbitary Arbitary 7 7 11 1 


Param. : ** minimum possible parameters estimated based on available information, actual architecture may have higher number of 


parameters. 
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4. CONCLUSION 


The PF method solved the issue of instability in learning. But it is prone to diminished probability 
which was solved by the PRF method. The PRF method additionally nullified common sources of variances 
successfully. Future works can include applying the proposed method for other fields and increasing the 
computational efficiency of the proposed methods. 
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